
\section{Source data}

	Lyngsoe Systems has established the RFID technology based system to take care of baggage handling tasks within a number of airports in Scandinavia. The information gathered from this system can be used not only to track bags, but also to diagnose problems of the airports, analyze the flows of bags, calculate peak days, periods, etc. This section contains explanation of how this collected data is organized in the original data sources.

\subsection*{Explanation of source data schema} 

	The data schema for bag tracking, shown in figure \ref{original}, contains information of bags and locations of bag in specific time moment. The information is distributed into 4 tables: \textit{reading}, \textit{location}, \textit{site} and \textit{route_leg}. \textit{reading} contains data from individual bag readings (controller, antenna, license plate, flight date, originating airport and timestamp of insertion) and is related with location (described by location name, latitude, longitude and information of who modified the record), which have specific site (country, site name, timezone and information of who modified the record). Some of this information is not necessary for business intelligence, e.g. this schema is suitable for the operation of the BagTrack system, but is inappropriate for business intelligence analysis due to the huge amount of detailed data.

	\begin{figure*}[h!tb]
		\includegraphics[width=\textwidth]{parts/images/dbSchemav2.pdf}
		\caption{Data schema of Lyngsoe Systems}
		\label{original}
	\end{figure*}

	The source schema has a limitation in that a bag can have a planned route of at most six airports. In order to increase compatibility with any future updates of the source schema -- and to save space by not having fields that are almost always empty due to most routes having only 2 or 3 legs -- our schema will take a different approach to storing routes, as will be explained in section \ref{sec:modelling}.
	%Efficiency of schema in cases of 2-3 airports for bag travel is quite low, and problem of what happens if there is bag that travels via more than 6 airports are not encountered at all.  %% We're not evaluating the quality of their schema, this is irrelevant
	
	Another thing that must be considered is the fact that the bag tracking system is not deployed to all airports. As such, a bag may have a route that takes it to an airport that has no tracking, meaning that we will not know the entire path of the bag.
	
	Furthermore, the source schema has fields that contain information which have no value for the purpose of business intelligence. For example, information of changes in the system is logged. Almost all tables have records about last user, machine and application that made any change in the system. This information is of no use with regard to business analysis, but in the operational database this information is necessary to keep track of changes.
	
	The data in this source schema is easy to manipulate and the database is quite fast at querying, but it is not suitable for online analytical processing, reporting and decision making. To make this data more suitable for a data warehouse, we will remove information that is not relevant for analysis, extract dimensions and propose a schema for the data warehouse.